Generating Image Captions Using Bahdanau Attention Mechanism and Transfer Learning

نویسندگان

چکیده

Automatic image caption prediction is a challenging task in natural language processing. Most of the researchers have used convolutional neural network as an encoder and decoder. However, accurate requires model to understand semantic relationship that exists between various objects present image. The attention mechanism performs linear combination decoder states. It emphasizes information with visual In this paper, we incorporated Bahdanau two pre-trained networks—Vector Geometry Group InceptionV3—to predict captions given models are encoders Recurrent With help mechanism, able provide context achieve bilingual evaluation understudy score 62.5. Our main goal compare performance on same dataset.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generating Images from Captions with Attention

Motivated by the recent progress in generative models, we introduce a model that generates images from natural language descriptions. The proposed model iteratively draws patches on a canvas, while attending to the relevant words in the description. After training on Microsoft COCO, we compare our model with several baseline generative models on image generation and retrieval tasks. We demonstr...

متن کامل

Generating Image Captions using Topic Focused Multi-document Summarization

In the near future digital cameras will come standardly equipped with GPS and compass and will automatically add global position and direction information to the metadata of every picture taken. Can we use this information, together with information from geographical information systems and the Web more generally, to caption images automatically? This challenge is being pursued in the TRIPOD pr...

متن کامل

Generating Diverse and Accurate Visual Captions by Comparative Adversarial Learning

We study how to generate captions that are not only accurate in describing an image but also discriminative across different images. The problem is both fundamental and interesting, as most machinegenerated captions, despite phenomenal research progresses in the past several years, are expressed in a very monotonic and featureless format. While such captions are normally accurate, they often la...

متن کامل

Learning Visual Representations using Images with Captions

Current methods for learning visual categories work well when a large amount of labeled data is available, but can run into severe difficulties when the number of labeled examples is small. When labeled data is scarce it may be beneficial to use unlabeled data to learn an image representation that is low-dimensional, but nevertheless captures the information required to discriminate between ima...

متن کامل

Generating captions without looking beyond objects

This paper explores new evaluation perspectives for image captioning and introduces a noun translation task that achieves comparative image caption generation performance by translating from a set of nouns to captions. This implies that in image captioning, all word categories other than nouns can be evoked by a powerful language model without sacrificing performance on n-gram precision. The pa...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Symmetry

سال: 2022

ISSN: ['0865-4824', '2226-1877']

DOI: https://doi.org/10.3390/sym14122681